Extraction of Semantic Information from Web Resources
نویسنده
چکیده
The paper addresses a problem of extraction of semantic information from Czech texts from the Web. The method described in this paper exploits existing linguistic tools created originally for a syntactically annotated corpus, Prague Dependency Treebank (PDT 2.0). We are working on development of a system which captures text of web-pages, annotates it linguistically by linguistic tools, extracts data and interprets the extracted data semantically in terms of web ontologies. The proposed extraction method is based on extraction rules – tree queries, which are adopted from the Netgraph application. Semantic interpretation of these rules provides semantics of the extracted data. We present some initial experiments in the domain of reports of traffic accidents.
منابع مشابه
Presenting a method for extracting structured domain-dependent information from Farsi Web pages
Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...
متن کاملAnnotation sémantique des ressources Web : Etat de lart et perspectives de recherche
The semantic annotation problem of Web resources interest researchers from different communities to improve the information retrieval process. Indeed, semantic annotation of documents and Web pages is a hard work. An automation of construction process is essential to modernize information retrieval in the Semantic Web. In this paper, we present a new semantic annotation approach of Web resource...
متن کاملSemantic Based Information Extraction from Web
Extraction of information from web is a challenging task. The information stored in a web may be structured or unstructured information. The structured information provides enhanced knowledge which helps to retrieve relevant documents. It helps the user to understand particular domain. This paper explores the importance of information extraction using semantics. It enables the users to discover...
متن کاملA New Method for Improving Computational Cost of Open Information Extraction Systems Using Log-Linear Model
Information extraction (IE) is a process of automatically providing a structured representation from an unstructured or semi-structured text. It is a long-standing challenge in natural language processing (NLP) which has been intensified by the increased volume of information and heterogeneity, and non-structured form of it. One of the core information extraction tasks is relation extraction wh...
متن کاملTowards Cross-Media Feature Extraction
In this paper we describe past and present work dealing with the use of textual resources, out of which semantic information can be extracted in order to provide for semantic annotation and indexing of associated image or video material. Since the emergence of semantic web technologies and resources, entities, relations and events extracted from textual resources by means of Information Extract...
متن کاملInformation Extraction from Wikipedia Using Pattern Learning
In this paper we present solutions for the crucial task of extracting structured information from massive free-text resources, such as Wikipedia, for the sake of semantic databases serving upcoming Semantic Web technologies. We demonstrate both a verb frame-based approach using deep natural language processing techniques with extraction patterns developed by human knowledge experts and machine ...
متن کامل